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Abstract 

Increasing interest in astronomical applications of non-linear curvature wavefront 
sensors for turbulence detection and correction makes it important to understand how 
best to handle the data they produce, particularly at low light levels. Algorithms 
for wavefront phase-retrieval from a four-plane curvature wavefront sensor are devel- 
oped and compared, with a view to their use for low order phase compensation in 
instruments combining adaptive optics and Lucky Imaging. The convergence speed 
and quality of iterative algorithms is compared to their step-size and techniques for 
phase retrieval at low photon counts are explored. 

Computer simulations show that at low light levels, preprocessing by convolution of 
the measured signal with a gaussian function can reduce by an order of magnitude the 
photon flux required for accurate phase retrieval of low-order errors. This facilitates 
wavefront correction on large telescopes with very faint reference stars. 



1 Introduction 

Adaptive optics (AO) systems, following their pro- 
posal by Babcock (1953), have been used success- 
fully on large ground-based telescopes to correct for 
the effects of atmospheric turbulence on incoming 
wavefronts, particularly in the infrared (Beckers, 
1993). The majority of these systems use a Shack- 
Hartmann wavefront sensor (Piatt and Shack, 1971; 
Hartmann, 1900) to derive the wavefront phase er- 
ror data for the AO system, with typically 8 < N < 
64 subapertures across the telescope diameter. 

Lucky Imaging (LI), a term coined by Fried 
(1978), is a technique to provide diffraction-limited 
images from ground-based telescopes. Many short- 
exposure images are taken, and a subset are se- 
lected on the basis of the sharpness of a reference 
star in the field of view. 

In Lucky Imaging applications, which are be- 
ing used increasingly, removal of as much aberra- 
tion as possible is desirable, regardless of the scale- 
length of phase distortions. On telescopes larger 
than 2.5m, Lucky Imaging will only work in the 
visible when combined with some additional de- 
gree of low-order phase correction such as may be 
provided by an adaptive optics system. In such 
a system, when using Shack-Hartmann wavefront 
sensors (SHWFS) the number of lenslets is selected 
based upon the degree of correction required. With 
highly aberrated wavefronts, many more lenslets 
are needed than for less aberrated wavefronts. The 
chosen number of lenslets is very difficult to change 
once the system has been built. Because of the 



fixed number of sub-apertures, and because of their 
fixed focal length, SHWFS systems cannot simul- 
taneously correct for all length-scales of aberration 
equally well. Curvature wavefront sensors (CWFS) 
have been shown to outperform SHWFS systems 
in some circumstances, especially in correcting low- 
and high-order aberrations simultaneously using a 
non-linear curvature wavefront sensor (nlCWFS) 
(Guyon, 2010). 

The curvature wavefront methods are extensions 
of a broad class of phase diversity wavefront sens- 
ing strategies (Gonsalves, 1982). In essence, these 
methods measure the intensity of an aberrated 
wavefront as it propagates through an optical sys- 
tem. Two or more of these intensity measurements 
can then be used to derive the true phase of the 
wavefront entering the system. nlCWFS systems 
measure the wavefront intensity in four planes, 
comprised of two equidistant pairs of planes, par- 
allel to, and at different distances from the pupil 
plane. In the pupil plane the light intensity is typi- 
cally assumed to be uniform. On either side of the 
pupil the light breaks up into a pattern of diffrac- 
tion limited speckles. As light propagates through 
the pupil, a region which changes from bright to 
dark implies a divergent wavefront while one that 
changes from dark to bright corresponds to a con- 
vergent wavefront. An iterative reconstruction al- 
gorithm is then used to reconstruct the pupil phase. 

Two planes used in a traditional CWFS are suf- 
ficient for this technique to work, but using four 
has a potential advantage as it allows the non- 
linear propagation of light through the pupil plane 



to be properly modelled. High-order phase distor- 
tions will introduce intensity changes on a shorter 
length-scale than low-order phase distortions. Us- 
ing a pair of planes close to the pupil plane, and 
a pair of planes far from the pupil plane, different 
length-scales of phase distortion can be recorded 
and corrected (Guyon, 2010). 

The earliest algorithm for phase-retrieval from 
intensity measurements in different planes, the 
Error-Reduction (ER) algorithm (Gerchberg and 
Saxton, 1972), worked with data both in the image 
and Fourier domains. For a high signal-to-noise in- 
put signal, the RMS error in the phase estimate was 
shown mathematically to decrease at every itera- 
tion, although the reduction for each step quickly 
became very small. A series of alternative algo- 
rithms have been proposed, all abstractions from 
the ER algorithm (Fienup, 1982), which viewed a 
portion of the iterative process as a linear function, 
and applied gradient-search techniques. These al- 
gorithms have much faster convergence, but with 
that faster convergence comes the potential for in- 
stability. 

The initial development of the case of four planes 
nlCWFS used an ER algorithm approach to re- 
cover phase from intensity measurements in post 
processing (Guyon, 2010). To facilitate the use of 
the method in real-time, the stagnation of conver- 
gence of the ER method must be overcome with 
one possible method being to use the techniques 
proposed by Fienup. 

The course followed here has been to apply a 
modification to the Fienup algorithms to the case 
of four images planes offering an improvement in 
the convergence rate of the wavefront fitting, par- 
ticularly at low signal-to-noise. This is especially 
important as the ability of a nlCWFS to work effec- 
tively at the lowest signal levels will enable much 
fainter reference stars to be used in the adaptive 
optics correction system. This is important if the 
fraction of the sky accessible for AO assisted sci- 
ence observations is to be maximised. Presently, a 
relatively bright reference star is required by AO 
wavefront sensors, limiting their application using 
natural guide stars to much less than 1% of the sky, 
even when non-linear phase retrieval strategies are 
used (Clare and Lane, 2004). 

Although it is important to consider the compu- 
tational feasibility of each approach to wavefront 
sensing, it is worth bearing in mind that the limit 
of what is possible with computers is a moving 
target. Order of magnitude estimations for what 
is currently possible are useful, but precise values 
based on today's technology will quickly become 
obsolete. Some techniques presented in this pa- 
per are probably beyond the realms of today's sil- 
icon, but nevertheless offer performance enhance- 



ments. These might currently only be interesting 
side-notes, but over the space of the next decade, it 
is likely that Moore's 'Law' will continue its steady 
progress, changing what can and cannot be consid- 
ered as viable options. 

Most approaches, CWFS included, work very 
well with many photons, but in Lucky Imaging 
applications, we can expect to encounter situa- 
tions with relatively few. Reconstruction algo- 
rithms struggle here as a pixel with zero intensity 
can carry no phase information. This means that 
the algorithm is constantly throwing away phase 
data by clamping intensities back to zero. It should 
be emphasised, however, that total phase correc- 
tion is unnecessary. Lucky Imaging has been shown 
to work well on 2.5 m telescopes, where there may 
be seven or eight turbulent cells across the diame- 
ter (D ~ 8r ). Obtaining this level of error is all 
that is required therefore — the statistics will do 
the rest. It is also important to remember that for 
Lucky Imaging we only wish to improve the images 
so that a good percentage are of adequate quality. 
We are not at all trying to achieve a perfect com- 
pensation. 

In §2 instrumental configurations and require- 
ments are described. In §3 the workings of the algo- 
rithm and some possible variations to it , including a 
parallelised implementation, are explained. In §4, 
strategies for dealing with low photon counts are 
presented and quantitatively compared. An exper- 
imental comparison of different iterative algorithms 
is presented in §5, and conclusions are drawn in §6. 

2 Instrument 

The physical setup of a non-linear curvature wave- 
front sensor (nlCWFS) is a little complicated how- 
ever, for the purposes of this phase-retrieval algo- 
rithm, it can be considered to consist of measure- 
ment planes located at distances ±z\ and ±z 2 from 
the pupil plane, as shown in Figure 1. 

Figure 1: Geometry of the measurement planes 
P3 Pi pupil P2 Pi 
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direction of light propagation 

Unless otherwise indicated, all calculations are 
based on a D = 4.2 m annular aperture with in- 
ner diameter 1 m and z\ and z 2 are 1000 km and 
2500 km respectively. All image data are assumed 
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to be 256 pixels square. These values correspond to 
the William Herschel 4.2m telescope on La Palma 
in the Canary Islands. For Lucky Imaging (LI) to 
be a useful technique, the probability pLucky of a 
lucky exposure (defined here as having RMS phase 
error across aperture less than 1 radian) needs to be 
high enough that it might be achieved many times 
within one observing session. Fried 's expression for 

PLucky IS: 



PLucky ^ 5.6 exp 



,'D 

-0.1557 | — 



(1) 



where ro is the Fried parameter. 

Lucky Imaging has been shown to work well on 
2.5 m telescopes, for which, assuming r = 0.35 m 
yields PLucky = 0.002. The target, then, for the al- 
gorithms below is to restore p Lucky to this range for 
larger telescopes. This can equivalently be stated 
as reducing the effective number of turbulent cells 
(across which the RMS phase error is 1 radian) 
across a diameter to approximately 8, or, alterna- 
tively, increasing the effective value of r to ^. 

Whichever analogy is used, the requirement is 
that the RMS phase error is reduced to around 
1 rad in some non-negligible proportion of expo- 
sures. 

To represent phase screens mathematically, the 
screen is broken up into individual pixels each rep- 
resented by a complex number. Light with phase 
angle <p and amplitude A is represented by the com- 
plex number A e 1 ^ . To recover the optical path dif- 
ference (OPD), which is needed to drive deformable 
mirrors for adaptive optic correction, the phase 
must be unwrapped, by adding and subtracting 
multiples of 2ir where appropriate to give a smooth 
function without discontinuities. The pixel spacing 
in these simulations is equivalent to d = 0.05 m in 
the telescope pupil and the wavelength A = 750 nm. 
Input phase is generated from an implementation 
by Schmidt (2010, p. 170) of a technique described 
by Harding, Johnston, and Lane (1999) for gen- 
erating phase screens from the Kolmogorov model 
of turbulence, (Kolmogorov, 1941a; Kolmogorov, 
1941b) using L = 100 m, l a = 0.01m and r = 
0.5 m. These correspond to the outer scale size, 
the inner scale size and the turbulent uncorrected 
cell size. 

The value of ro represents an unrealistic sce- 
nario for uncorrected wavefronts. It was not chosen 
through optimism but for computational simplicity 
as the rudimentary phase unwrapping process often 
failed (even for well-reconstructed wavefronts) with 
lower values of r , rendering meaningful algorithm 
comparison difficult. The mean RMS optical path 
difference in the input was approximately 500nm, 
which represents a realistic scenario of what might 



happen when running the AO system in 'closed 
loop' configuration. The algorithm did success- 
fully converge with larger aberrations (lower r ) 
and theoretical calculations suggest that the algo- 
rithm ought to provide useful output for values of 
ro around 10 cm (see Figure 4). 

The proposed optical setup involves splitting the 
beam into 4 different colour bands using dichroics 
(so each near-pupil plane image in fact is formed 
through a different pass band), but this is not im- 
portant for the consideration of this paper. It is of 
note, however, that as a result of the efforts made 
in instrumentation to ensure the achromaticity of 
the optical system, the light is assumed to be of 
one wavelength only. Our simulations sugg est that 
this is a good approximation particularly for the 
low order corrections that we wish to achieve. 



3 Algorithm 

3.1 Summary of algorithms 

The iterative algorithms presented here share the 
same broad structure. The inputs are m p , the mea- 
sured amplitudes at each plane pi, and, optionally, 
9o(Pi)j the initial phase candidates. 

Conceptually, the algorithms involve using 
known constraints to recursively refine an estimate 
of the pupil phase. These constraints are the mea- 
sured intensities in the four measurement planes 
and the pupil shape. An accurate estimate for 
the pupil phase must reproduce (upon simulated 
propagation) the measured intensities at the four 
planes, with zero intensity outside the aperture of 
the pupil. It may be possible to assume uniform 
pupil illumination and use this as a further con- 
straint, although this is likely to present problems 
at low photon counts, when the pupil will have few 
enough points illuminated that the assumption of 
uniformity breaks down. 

The algorithm begins by making an estimate of 
the current pupil phase. A numerically simulated 
propagation is then performed, to give an estimate 
for the phase and amplitude of the wavefront at 
one of the measurement planes based on the pupil 
estimate. The amplitude estimate at each pixel is 
then discarded and replaced with the amplitude de- 
rived (by square-rooting) from the measured inten- 
sity. The phase estimate is retained. This new esti- 
mate is propagated to another measurement plane, 
where again the phase estimate is retained, whilst 
replacing the estimated amplitude with the mea- 
sured one. This process of propagating between 
planes is repeated until a given number of itera- 
tions have been performed, or until the estimate 
has completely succeeded (or completely failed) to 
converge. 
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Mathematically, the algorithms are described as 
follows, where Aij indicates a propagation from 
plane i to plane j: 

9k(Pj) = [g k (pi)\ (2) 
o 

9k(Pj) = a 3,(k-m) g'miPj) + b j,(k-m) 9 m {P 3 ) 

m—k 

(3) 

9k+i(Pj) = ar S \SkiPj)] abs l m p\ ( 4 ) 

Most of the a,i and bi are zero, and, since multiply- 
ing each dj, bi by a constant has no effect, we place 
an arbitrary restraint that 

5^(oi + 6i) = l. (5) 

In Table I and all subsequent descriptions, only the 
non-zero ai,bi are given. 

Table 1: Summary of parameter values for different 
algorithms, h is a feedback parameter and represents 
the fraction of the phase used from input in the next 
estimate. 



Name 


Parameters 


Error- Reduction 


a 


= 1 


Input-Output 


a 


= h, bo = 1 — h 


Output-Output 


a 


= 1 + h, a\ = —h 


In-Out-Out 


a 


= h, a\ = —h, b = 1 



The approach of Fienup (1982) was only designed 
for use in the 'linear' region of the propagated wave- 
front, defined as the region where a small change in 
phase at a certain point in the 'input' gives rise to 
a small change in the phase at the corresponding 
point in the 'output' and nowhere else. The algo- 
rithm which has been developed here works in the 
non-linear region, despite not being designed to do 
so, although a formal analysis of why this is the 
case is not the subject of this paper. 

3.2 Algorithms for use in four planes 

In extending the standard algorithms to four 
planes, several variations are possible, such as the 
order in which propagation is done between the 
planes. Clearly, with two planes, there is only one 
scheme possible, but, eliminating cyclic and reflec- 
tional symmetries in the visiting order, with four 
planes there are 4!/(4 • 2) =3 possible visiting cy- 
cles, as shown in Table 2. 

Since pairs (pi,Pz) and (p3 7 P4) shown in Figure 
1 contain similar length-scale information, it can be 
assumed that Schemes 1 and 2 will be broadly sim- 
ilar. However, we might expect Scheme 3 to behave 



Table 2: Visiting orders for four planes 

Scheme 1: (1,2,3,4) 
Scheme 2: (1,2,4,3) 
Scheme 3: (1,3,2,4) 



slightly differently, as each propagation moves to a 
plane with different length-scale information. 

A second possible variation is the choice of what 
type of 'step' is applied at each plane. The Input- 
Output (10) algorithm starts with an estimate of 
the input phase. This is then propagated to gen- 
erate a phase at the output position. This out- 
put phase will then be used for the next iteration 
generally combined with some fraction of the in- 
put phase. It is possible to apply an 'input-output' 
step after four propagations (i.e. once every itera- 
tion), after every two propagations (twice per iter- 
ation), or after every iteration (four times per iter- 
ation). These different algorithms will be referred 
to as 101, 102, and 104 respectively, and use a 
similar naming convention for the Output-Output 
(00) and In-Out-Out (100) algorithms. In each 
case, all 'ordinary' steps are ER steps. Further- 
more, it is possible to use a different value of h at 
the different planes, for example h — hi at p\ and 
P2, h = hi at pz and p^. 

3.3 Effect of arbitrary phase differ- 
ence 

The algorithms will, of course, recover only the rel- 
ative phase, as there is no reference phase informa- 
tion available. Therefore, an arbitrary phase differ- 
ence — generally different for each iteration, but 
constant across the plane — will be present in each 
algorithm at each plane. 

In the development of these algorithms, only a 
basic attempt has been made to correct for the ar- 
bitrary phase difference introduced at each plane. 
Each of the initial estimates at the measurement 
planes has been assigned a uniform phase angle by 
multiplying the zero-phase values by e lkz , where 
k = 2tt/\ is the wavenumber. With an ER algo- 
rithm, the overall phase difference between itera- 
tions is unimportant — the phase of the previous 
iteration is discarded and replaced. When the dif- 
ference between iterations is used to calculate the 
new phase, however, (i.e. all algorithms except ER) 
a problem arises if there is an additional arbitrary 
phase <p a . 

The direction of the propagation chosen will be 
influenced by this arbitrary phase: whether this is 
catastrophic for the convergence of the algorithm 
depends on the probability distribution of <p a . If 
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Figure 2: Schematic diagram of the four-plane reconstruction method. The step in which the new candidate is 
calculated is different in each algorithm. 
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4> a ~ uniform[— 7r, it] (the phase angle introduced 
between iterations is distributed uniformly over all 
angles), the reconstruction will become a random 
walk for some (larger) values of h; adding a uni- 
formly distributed random angle to any angle yields 
a uniformly distributed random angle, due to the 
fact that the angle 'wraps around'. The phase er- 
ror at each step due to this arbitrary phase can be 
found by comparing the phase angle with and with- 
out it. The calculation of this error 4> e for the 10 
algorithm is as follows. The assumption below is 
that Ak — Ak+i, where is the amplitude of a 
given pixel at the k th iteration. <f> c — <f>k+i — 4>k is 
the phase correction at the k th iteration. 

For the effect of the arbitrary phase difference 
not to hamper the convergence of the algorithm, 
the phase must be distributed more tightly than 
uniform [— 7r, 7r]. Inspection of the function in (8) 
reveals that for good convergence, either b or <f> a 
must be small, the latter in the 'small angle ap- 
proximation' sense. It would appear that in most 
cases, the requirement holds, or at least approxi- 
mately holds, but it is possible to directly test the 
distribution of <p a , and to experimentally compare 
its magnitude to 4> c — see Figure 3. An estimation 
of <p a is possible, if it is small, by taking the mean 
of the phase difference between iterations, weighted 
by the intensity of the signal: 

, £j(0(k+l),j - faj) 1 ] 



arg[z], where 



(6) 



(7) 



+ h(e l 



i<t>k+i4>c 



e i<f>k _|_ Ji (gi<)>k+i<j>c+i<f>a 

e l<t >« (l + h(e i ^ -1)) 



= 1 



6(1 - e^) 



e i<pa+i<t>c _|_ 5(g#a+i0c _ X) 



1 + e 



= 1-1- e-K^t+dc) 



1 + 6(1 - e -*Wa+0c)) 

-2ibsm($A 
1 + 6(1 - e-'W^+W) 



(8) 



1 + /i(e^+^ - 1) 



[let b=h-l] 



Correcting for this phase, therefore, may help in 
improving the stability of the algorithm at higher 
values of h — in practice, however, calculation of 
4> a must be computationally efficient and quick, 
or fewer iterations of the algorithm will be possi- 
ble in a preordained time, potentially negating the 
benefits of its correction. It is also possible that, 
given the simple scheme of calculating </> a , assuming 
<t>(k+i),j-<i>kj = arg [eWcfc+DJ-^M)], (i.e. without 
phase unwrapping) , a miscstimation is likely, which 
could increase the error it seeks to avoid. Consider 
the scenario of an arbitrary phase of 4> a ~ 7r added 
to actual corrections of order ^0.1. The calculated 
value of 4> a will be ~ if the real value of <\> a is suf- 
ficiently close to 7r. Fortunately, the actual value 
of 4> a , as can be seen from Figure 3, is an order 
of magnitude lower than (f> c in the case with am- 
ple light. Assuming this size difference is similar 
at low light-levels, correcting for <p a will not be of 
great importance for algorithm performance. 

3.4 Propagation Techniques 

The Fresnel integral describing the propagation of 
a known wavefront through space can be expressed 
in the form of a two-dimensional Fourier transform 
and two multiplications. Discretising the Fourier 
transform and using Fast Fourier Transform (FFT) 
techniques gives a computationally efficient way of 
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Figure 3: Progress of 104 [h = 1.1) for 30 iterations. Determination of (j> a and cj> c was at pi. 




simulating the propagation of a wavefront between 
two parallel planes. The discrete nature of the cal- 
culation fixes the relative grid spacing of the sam- 
pling at the two planes so in order to use this sim- 
ple one-step technique, (so-called because it only 
requires one FFT) the grid spacings of the planes 
must have a particular relationship, or the distance 
between them must have a specific value. The grid 
spacings of the measurement planes is fixed by the 
pixel size of the imaging equipment, so the distance 
between planes is fixed. There is no guarantee that 
arranging the measurement planes at these precise 
relative distances gives good performance in a re- 
construction algorithm. 

In Schmidt's book, two more flexible numeri- 
cal propagation techniques are outlined (Schmidt, 
2010, Chapter 6). Schmidt explains four theoretical 
sampling constraints, determining the limiting re- 
lationships between grid size, grid spacings at each 
plane, and propagation distance. 

The first technique the two-step Fresnel prop- 
agation, involves splitting the required propaga- 
tion into two steps, each step involving one Fourier 
transform (FT). When, as here, the grid spacings of 
the two planes are equal, the technique is equivalent 
to propagating from the first plane to a plane half 
way between the first and second planes, and then 
on to the second plane, with the grid-spacing at 
the intermediate plane being chosen to satisfy the 
one-step constraint on distance and grid-spacing. 
Because the CWFS planes are symmetric about 
the pupil plane, when propagating from p\ to P2, 
or from p 3 to p^ (see Figure 2), this intermediate 
plane is the pupil plane, albeit with a different grid- 
spacing from the other planes. Using schemes 1 or 2 
(see Table 2), allows us to impose the pupil shape 
constraint twice per iteration with no extra FTs. 
To do this would otherwise add 50% to the run- 
ning time of each iteration. 

The second technique, the angular spectrum 



method, also involves two FTs, but here there is 
no intermediate plane, so it is impossible to imple- 
ment any clever tricks and impose the pupil shape 
'for free'. Additionally, the sampling constraints 
to avoid frequency aliasing are such that the tech- 
nique works best only for small propagation dis- 
tances, whilst the Fresnel method works best only 
for large distances. Because the power spectrum of 
the phase error due to atmospheric turbulence has a 
greater value at lower frequencies. The high-order 
structures which appear in the working region of 
the angular spectrum method will contribute much 
less to the error than the low-order structures vis- 
ible in the working region of the two-step Fresnel 
method. The exception to this rule is when ex- 
tremely coarse binning is used, such as may be the 
case at low light-levels (see, §4). 

Because of these effects, algorithms use Fresnel 
propagation and visiting scheme 1 in Table 2, as the 
speedup obtained from imposing the pupil shape 
'for free' is more effective than any possible faster 
convergence from visiting the planes in a different 
order. The implementation presented here requires 
eight 256 x 256 FTs for each iteration (two per prop- 
agation), and it is this which is expected to be re- 
sponsible for the majority of the execution time. 
If, as is seen in some instances, an algorithm con- 
verges to a solution in 5 iterations, this requires 40 
256 x 256 FTs. 

3.5 Grid spacing 

Unlike with a SHWFS, when using a nlCWFS each 
grid point in the pupil plane holds phase informa- 
tion, rather than phase slope information for each 
sub-aperture. As a result of this, once the true 
(i.e. unwrapped) phase difference between a region 
of neighbouring grid points is greater than 7r, any 
phase-unwrapping algorithm, or even any phase- 
slope fitting method, will struggle to identify the 



6 



true unwrapped phase. 

The expected value of the squared phase differ- 
ence between two points is expressed (Fried, 1965) 
in terms of the Fried parameter r as: 



I>*(|ri-r 2 | 



6i 



|n - r 2 \ 
ro 



5/3 



(9) 



This expression represents an expectation value 
- the maximum value may be much larger, but 
is less likely to occur. If the errors in the recon- 
struction algorithm add a phase error at each point 
that is normally distributed about with variance 
a 2 = 2 , and the true phase difference between 
grid points is normally distributed about with 
variance a 2 — D(d), then the reconstructed phase 
difference between adjacent grid points will be dis- 
tributed with mean and variance a 2 — 29 2 +D(d). 
The error in the algorithm will cause wrap-around 
between pixels with probability p wrap if: 



a useful upper bound on d in terms of ro . For exam- 
ple, with an effective closed-loop Fried parameter 
of 1 m, an error of around 0.7 rad in the reconstruc- 
tion algorithm would allow a grid spacing of 20cm 
if 1% was an acceptable likelihood of wrap-around 
effects. 

Clearly the spatial extent of the grid must be 
at least as large as the size of the pupil, or not 
all phase information will be recovered. An aber- 
rated wavefront will spread spatially on propaga- 
tion. A finite fourier transform makes the assump- 
tion that the image is periodic on its boundaries. 
For the finite size of the grid not to cause overlap- 
ping ("wraparound") at the edges of the propagated 
images, the pupil plane image must therefore be 
padded with zeros. We define the padding factor 
to be M pa dding = ^jt — this is the linear ratio of 
the grid extent to the pupil extent — and choose a 
value such that the linear grid size N is a power of 
two. 



2 (crfc \p wrav )f (29 2 + D(d)) > tt 2 . (10) 3 . 6 Parallelisation 



where erfc is the complementary error function. 
The grid-spacing d permissible is then given by: 



(i.8S [ - 

So 



d < 0.314 r 



5/3 



+ 2# 2 < I 



2 Verfc 1 (p wrap ) 



2 Verfc 1 (p wrap ) 



29 2 e ) 



3/5 



(11) 



This constraint on d is shown in Figure 4. In a 
closed loop situation, this condition is less restric- 
tive, as the effective r is increased. 

Figure 4: Constraint on d needed to avoid wrap- 
around in the recovered phase 
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The error-dependence of this constraint is unsur- 
prising — we would reasonably expect that any er- 
ror in the reconstruction must be compensated by a 
finer grid of sampling points — but it does provide 



Two-dimensional FFTs can be effectively paral- 
lelised and executed on Graphics Processor Units 
(GPUs). The use of GPUs opens the additional 
possibility of using a parallelised phase retrieval al- 
gorithm. In this case, the flow of execution is some- 
what more complicated, and many possibilities are 
available. A basic approach has been explored here. 

Because of the serial nature of text, this concept 
is best explained using a diagram (Figure 5). 

When the parallel algorithm terminates, a choice 
of four different outputs (each of the four candi- 
date solutions, one from each measurement plane) 
is available. Using all four estimates of the pupil 
phase and combining information from them for 
phase recovery is likely to result in a better esti- 
mate being achieved. 

4 Low light levels 

At high frame rates and with faint reference stars, 
low photon counts in the imaging planes may be ex- 
pected. Typically, data binning is used when light 
levels drop, however, the sampling constraints in 
the Fourier domain for the Fresnel propagations 
used in the algorithms mean that to reduce the 
grid-size used in propagations, the measurement 
planes must be at greater distances from the pupil 
plane. The distance Az of the measurement plane 
from the pupil plane must satisfy: 



Az > 



(D Mpadding) 2 

XN 



(12) 



where D is the aperture diameter, M pa dding is the 
padding factor, A is the wavelength, and N is the 
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Figure 5: Conceptual diagram of a parallelised algorithm. Arrows represent propagations. The vertical direction 
represents both time (increasing downwards) and error (decreasing downwards, theoretically). Each column 
represents a plane, and each of the four threads execute only one type of propagation (e.g. propagating from pi 
to P2) and the subsequent fo-sized step, indicated by a dotted line. 
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grid size (Schmidt, 2010). The angular spectrum 
propagation technique must be used in this case. 
Even if the angular spectrum method is used, spa- 
tial binning still presents a fundamental problem 
for accurate reconstruction of the OPD (see §3.5). 

4.1 Gaussian convolution 

Signals at low light levels are dominated by Pois- 
son noise. Because of this, rather than binning 
the data we preprocess it by convolving it with 
a 2D Gaussian function (essentially 'blurring' the 
image). Gaussian convolution (GC) has the ad- 
vantage of preserving the spatial information that 
would be lost by binning, and ensures that all pix- 
els have non-zero intensities. Binning can still leave 
some pixels with zero intensity. 

The speckle size at distance z from the pupil 
plane is given by 4^, so a GC of around this 
size could reasonably be expected to perform well. 
Since z is different at each plane, it would make 
sense to use a different sized GC for planes at ±z 2 
from ±z\, which will be referred to as DGC (dif- 
ferential GC). Both of these techniques require the 
execution of two FFTs. 

4.2 Triangular Filling 

With a grid size of 256 x 256 pixels, and low photon 
counts of O(10 4 ) or fewer (i.e. fewer photons than 
pixels), we can expect that the majority of pixels 
will receive no photons, and a small number will re- 
ceive one photon (possibly more) . We need to infer 
intensity from photon density in this region. Tech- 
niques exist for this purpose (Willett and Nowak, 
2004), although many assume regions of uniform 
intensity, separated by smooth boundaries, which 
is not true of the intensity measurements we ex- 



pect to make at low light levels. We propose the 
following approach be used at each measurement 
plane at extremely low photon counts: 

1 . Let V be the set of pixels with non-zero inten- 
sities. 

2. Compute T, the Delaunay triangulation of V. 

3. For each triangle in T, set pixels of this tri- 

3 

angle to 4 ^ I n w n , where /„ is the value at 

n=l 

vertex n, w n is a weighting factor, equal to the 
portion of the pixel which the triangle covers, 
and A is the pixel area of the triangle. 

This technique of area-weighted triangulation fill- 
ing (AWTF) should be well-suited to GPU imple- 
mentation. Rendering triangle meshes is a task 
that GPUs are well-designed for, although the time 
taken depends upon options such as anti-aliasing 
techniques as this has not been verified experimen- 
tally. Delaunay triangulation for n points can be 
performed in O(nlogn) time. Fortunately the sce- 
nario in which AWTF is needed the most (fewest 
non-zero pixels) is also the scenario in which it ex- 
ecutes fastest. AWTF can be used in conjunction 
with GC, in that order. 

4.3 Voronoi Filling 

Another polygon-based technique may have even 
better performance, however: the Voronoi diagram 
or tesselation of the non-zero points. Voronoi tes- 
selation has been used in image reconstruction 
in conjunction with Bayesian methods (Cabrera, 
Casassus, and Hitschfeld, 2008). The method pre- 
sented here is somewhat simpler than that pre- 
sented by Cabrera, Casassus, and Hitschfeld how- 
ever, in that the non-zero pixels are used to com- 
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pute the tesselation, rather than any more com- 
plicated method. The steps used to compute the 
area-weighted Voronoi filling (AWVF) are: 

1 . Let V be the set of pixels with non-zero inten- 
sities. 

2. Compute V, the Voronoi tesselation of V 
(bounded by the image limits). 

3. For each polygon p in V, set pixels within this 
polygon to where I is the intensity of the 
pixel inside p and A is the pixel area of p. 

Voronoi filling has the unfortunate property of re- 
moving all of the localisation (of intensity) from the 
image, and this causes it to be a poor technique to 
use on its own. Using functional notation, the fol- 
lowing scheme somewhat mitigates this problem: 

v/DGC4(AWVF(/)) • DGC4(7) (13) 

The approach shown in (13) , which will be referred 
to as Gaussian- Voronoi (GV), has the advantage 
of preserving the intensity localisation, but adjusts 
the intensity proportionally to the linear density of 
observed photons (the inverse of the distance be- 
tween them). 

4.4 Guiding Lucky Imaging selection 
in low light 

Lucky Imaging requires the selection of images 
based on some criteria of sharpness. If the recon- 
struction of the pupil phase is poor, the AO cor- 
rection applied will not adequately correct the at- 
mospheric error and the science image will not be 
sharp. Measures available during the convergence 
of the algorithm can give an indication of how suc- 
cessful convergence has been. These measures may 
then be used to guide the selection of images for 
combining using LI techniques. One such measure 
is the intensity-weighted variance of the phase dif- 
ference between successive iterations of the algo- 
rithm at any of the measurement planes. If this is 
high, the convergence is either poor or unfinished. 
This measure could either be used during capture 
or retrospectively to select exposures for LI. 

4.5 Comparison of low-photon-count 

strategies 

In simulations of various light levels, the same re- 
construction strategy — 30 iterations of the 104 
algorithm with parameter h — 1.08 — was used for 
all inputs. 100 independent phase screens as de- 
scribed in §2 were used to generate complex pupil 
functions, which were then propagated to the four 
image planes. Poisson noise was then added to the 



intensity (amplitude squared) images, which was 
used as the input to the correction techniques or to 
the algorithm directly. The resulting pupil phase 
was then unwrapped and the RMS error (i.e. the 
RMS value of the difference between the unwrapped 
reconstructed phase and the input phase) was com- 
puted. Naturally this incorporates into the results 
the successes (and failings) of the unwrapping al- 
gorithm, although comparing the wrapped phases 
would limit the range of error values to ±ir rad. It is 
therefore preferred to the alternative of no unwrap- 
ping. In the low photon-count limit however, the 
unwrapping stage frequently partially failed, due to 
lack of sharpness of the phase in the pupil plane. 
This in turn gave a high RMS error due to the spu- 
rious jumps of multiples of 2ir. In these instances, 
the output RMS error was higher than the RMS 
error of the input, which is clearly not desirable or 
representative. Using a more robust unwrapping 
technique or a Zernike phase-slope technique might 
allow unwrapping to be successful at low photon 
counts, enabling further study of this region. By 
dividing the aberrated complex pupil function by 
the reconstructed complex pupil function, clamp- 
ing the amplitude to 1 for all pupil plane pixels, 
and calculating the resulting point spread function 
(PSF), a somewhat clearer indication of the true 
performance is available compared with what can 
be obtained by the limited unwrapping techniques 
available. A comparison of the central portions 
of PSFs for a randomly generated input phase is 
shown in Figure 6. Here, the relative performance 
of the low light techniques is clearly visible. 

There is a clear advantage, shown in Figure 7, 
of applying some correction to the measured in- 
tensity images before the reconstruction occurs. 
DGC is particularly fast to execute, either on a 
GPU or within a standard PC. The good perfor- 
mance of DGC is perhaps because, unlike using spa- 
tial binning, the location information of the pho- 
tons counted is preserved, albeit slightly 'smeared'. 
This allows the nlCWFS to more accurately de- 
termine the phase, especially the low order modes, 
since these are more spatially spread regardless of 
light level. AWTF offers a slight improvement over 
DGC, although it will execute more slowly given 
the computational complexity of the technique. 

All of the techniques investigated offer improve- 
ments at low photon counts at the expense of accu- 
racy at high photon-counts. This doesn't present a 
problem however, as these corrections can be acti- 
vated at will, and are not 'built in' to the algorithm 
itself, merely added at the start. A good estimate 
of the crossover points at which each approach be- 
comes best-suited is all that is needed to ensure 
optimum performance. 

As can be seen in Figures 8-10, the level of blur- 
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Figure 6: Central portion of (e) the ideal Point Spread Function, and the PSF of (a) the uncorrected wavefront, 
and wavefronts corrected using (b) DGC8 + 5, (c) AWTF, DGC4 + 5, (d) GV + 4, where '+ n' denotes n 
iterations of 104 with h — 1.08. RMSE in the input was 590 nm and there were 393 photons. 
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Figure 7: Simulated low light performance of CWFS using algorithm 104 with and without preprocessing the 
input planes with DGC3 and AWTF (see text). Dotted lines show the upper and lower quartiles. AWTF is only 
used below 2000 photons. Vertical lines show the I-band magnitude required for a given photon count with the 
CWFS running at 10 Hz for D — 4.2m (lower label), and D = 10.5m (upper label). 
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ring used places a lower bound on the residual 
OPD. The data suggest that smaller amounts of 
blurring allow better correction at higher photon- 
counts; higher amounts of blurring limit the level 
of correction attainable, but increase the likelihood 
of achieving a similar level of correction at low 
photon-counts. Using DGC with 3px outer blur- 
ring and 1.5px inner (henceforth, DGCn refers to 
outer spacing of n px and inner spacing of ^ px), 
10% of inputs were corrected to around 100 nm 
residual RMS, with similar performance for DGC8 
(Figure 9), but DGC16 (Figure 10) gave poorer 
results at this level. Comparing the size of the 
gaussian blur used to the speckle size gives an in- 



sight into this. At ±z%, we expect a speckle size of 
= 44.6 cm ~ 9px. At ±zi we expect a speckle 
size of 17.9 cm ~ 4 px. These values closely match 
those used for DGC8. A possible reason for DGC8 
allowing good corrections at low photon counts is 
its rejection (by blurring) of features smaller than 
the speckle size. 

Preprocessing the input with GV (Figure 11) 
gives improved performance over DGC8 with 300- 
1000 photons. In the PSF shown in Figure 6d, the 
first Airy disc is just distinguishable, although the 
halo is of comparable size to the uncorrected image, 
if somewhat fainter. This is consistent with removal 
of the lower order errors, which is precisely what is 
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Figure 8: Simulated low light performance of CWFS using algorithm 104 preprocessing the input planes with 
DGC3. Vertical lines show the I-band magnitude required for a given photon count with the CWFS running at 
10 Hz for D — 4.2m (lower label), and D = 10.5m (upper label). The curves show the performance for different 
Lucky Imaging selection percentages. 
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Figure 10: Simulated low light performance of CWFS using algorithm 104 preprocessing the input planes with 
DGC16. Vertical lines show the I-band magnitude required for a given photon count with the CWFS running at 
10 Hz for D — 4.2m (lower label), and D = 10.5m (upper label). 
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Figure 11: Simulated low light performance of CWFS using algorithm 104 preprocessing the input planes with 
GV. Vertical lines show the I-band magnitude required for a given photon count with the CWFS running at 
10 Hz for D — 4.2m (lower label), and D = 10.5m (upper label). 
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required for Lucky Imaging. 

From Figure 12, a correction strategy can be de- 
rived, which is shown in Table 3. 



Table 3: A correction strategy using DGC 



# photons 


blur amount (px) 


> 4 • 10 5 





> 4 • 10 3 


2 


> 6 • 10 2 


4 


otherwise 


8 



4.6 Comparing the simulation to ex- 
pected conditions 

With a better model for generating temporally- 
related phase-screens, the simulations could have 
been made more realistic. As each pupil phase was 
randomly generated, there was no relationship be- 
tween successive phases. In practice, however, suc- 
cessive pupil phases will be related. With a con- 
stant wind-speed, the large-scale structure of the 
turbulence, and of the phase error, will move across 
the aperture at a constant speed. Identifying the 
speed and direction of this motion would allow a 
reasonable correction of the low-order phase errors, 
thus reducing the RMS OPD of the light entering 
the CWFS. In fact, a whole host of finely-tuned 
motion-estimation algorithms do already exist, em- 
ployed for the compression of video images. Typical 
median wind speeds on good astronomical sites are 
in the region of 8-10 m/s. 

This lack of prior phase estimate in the simula- 
tion decreases the performance of the algorithm. 
However, the selected value of r gave an RMS 
OPD similar to that which it is estimated would 
be encountered in closed- loop AO. 

Roddier gives m = 14.6 + 81og(A Mm ) as an es- 
timate of the limiting magnitude for AO using a 
SHWFS (1999) — a compromise between reducing 
the number of sub-apertures (to get more light per 
subaperture) and the seeing limit that this imposes 
due to the subaperture-size being greater than ro- 
This gives a value of m = 13.6 for A = 750 nm. The 
performance that could be delivered by the present 
technique will allow the use of guide-stars up to 5 
magnitudes fainter than this. 

5 Experimental comparison of 
algorithms 

The focus of this section is the relative perfor- 
mance of the reconstruction algorithms, so only a 
few fixed photon-count values have been used. All 



algorithms perform well in ample light, which is 
not the operating region of LI, so less focus has 
been placed on exploring this region. Where per- 
formance is of interest is in the region where the 
RMSE is of order a few radians, as this is approxi- 
mately the working range for LI. Three light levels 
have been investigated — 100, 1,000, and 10,000 
photons. The pre-processing used for each scenario, 
shown in Table 4, is based on the strategy outlined 
in §4.5 (Table 3). The statistics shown are based 
on 100 independent runs, as in §4. 

Table 4: Pre-processing used for comparing parameter 
values and algorithms 



# photons 


blur amount (px) 


outer 


inner 


100 


8 


4 


1,000 


4 


2 


10,000 


2 


1 



With 10 4 photons, 104 has marginally better 
performance than 004 or 1004, with even the 75th 
percentile having very low error of 40 nm RMS. At 
this light level, 104 performs best for h ~ 1, whilst 
004 seems agnostic to step size, but has a slightly 
increased error in the higher percentiles. This level 
of performance would be important for ordinary 
imaging, but is slightly less important for LI, which 
provides no real advantage in this region. The cu- 
rious dependence of 1004 on h seems to occur at 
all light levels. The data suggest the idea that two 
operating regions exist, one either side of this local 
error maximum. How these regions differ, and why 
it should be that the algorithm converges so poorly 
with intermediate h values is not immediately ap- 
parent. It is possible that this behaviour may be 
related to the arbitrary phase difference discussed 
in §3.3. The wide variation in performance of these 
algorithms suggests that a more exhaustive search 
of the parameter space could yield algorithms with 
better performance. 

When the number of photons is reduced to 1000, 
the 104 algorithm has the highest proportion of 
output residuals below 100 nm RMS, with 10% of 
outputs falling below this value. Even at 25% selec- 
tion for LI, good images would likely be obtained. 
004 and 1004 also perform well at this light-level, 
each behaving as at higher light levels, but with 
higher error, and a lower proportion of outputs near 
the minimum error. 

Upon further reduction of the number of photons 
to 100, algorithm 1004 with h = 5 outperforms 
the other two algorithms for the parameter values 
tested, with the minimum of each percentile lower 
than the corresponding minimum for 104 or 004. 
In designing an AO system, the data in Figure 13 
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Figure 12: DGC performance: The 1% quantile is shown for different amounts of blurring and for GV. For 
DGC amount for the inner planes is half that which is shown for the outer planes. GV is applied only at photon 
counts of 1000 and below. Vertical lines show the I-band magnitude required for a given photon count with the 
CWFS running at 10 Hz for D = 4.2m (lower label), and D = 10.5m (upper label). 
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enable the selection of algorithm and parameters 
best matched to the conditions and goals of use. 

6 Discussion 

Reconstruction of wavefront phase from photon- 
limited defocus images is an unusual task, and there 
does not appear to have been a significant amount 
of research into this very particular application. A 
statistically driven (e.g. Bayesian) reconstruction 
strategy may improve further on the results ob- 
tained here, which would in turn allow for improved 
algorithm performance at low photon-counts. 

The performance at low light levels here suggests 
that a nlCWFS is a good solution to shortcom- 
ings of the SHWFS particularly at low light lev- 
els and low-order phase errors. Even with an in- 
put phase error of 5rad RMS, nlCWFS + DGC 
is able to reduce the phase error into LI territory 
with as few as 1000 photons. If this performance is 
achieved on the Gran Telescopio Canarias (GTC), 
natural guide stars with I-band magnitude 20 could 
be used. With a sufficiently fast implementation 
of GV preprocessing, the limits could extend even 
further into the low light levels, as this technique 
delivers good results even with only 400 photons. 

Although the work presented in §5 has shown 
an investigation of several algorithms, it repre- 



sents only a small subset of possible algorithms 
- a few straight lines through the multidimen- 
sional space of <ij and 6j. In fact, the search here 
has been limited to the 4-brane passing through 
ao = 1, bo = 1, ai = 1, and b\ — 1. An optimisation 
approach could be used to find values of dj, bi that 
perform best at a given photon level. This would 
require considerable computing time, as each pa- 
rameter optimised adds a dimension to the search 
space. As an example, finding optimal parameters 
for cii,bi,i < q to the nearest d, and constraining 
|<2j|,|&j| < c would require comparing the perfor- 
mance of (2g + l^ 2q algorithms. Constraining 
the search space by applying heuristics to deter- 
mine which algorithms might be 'forward-stepping' 
rather than 'backward-stepping' could reduce the 
computational cost. 

At very low light-levels, an 'adaptive' algorithm 
may outperform those investigated here. Such an 
algorithm would, instead of rigidly imposing ampli- 
tude constraints, refine the estimate of the ampli- 
tude constraints, taking into account the fact that 
the input data is subject to Poisson statistics and 
is therefore not a fair representation of the true in- 
tensities at the measurement planes. 

An atmospheric model capable of using previous 
phase aberrations to infer the motions of turbulent 
structures across the aperture would be able to pro- 
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Figure 13: Performance of the algorithms at different light levels. The logarithmic horizontal scale shows the 
parameter h used, whilst the vertical axis shows the RMS OPD after correction (n.b. different scale for each 
row). The absence of points at the 100- and 1000- photon levels in the 1004 algorithm is due to the failure of 
the algorithm to converge to within the tolerance of the error unwrapping algorithm in any of the samples. The 
legend in the lower central panel applies to all of the graphs. 
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vide a much better initial estimate for the phase, 
and therefore the algorithm would start much closer 
to the desired solution. There is, however, a fun- 
damental limit to what information we can deduce 
from a given number of photons. In order to drive 
an n-elcmcnt deformable mirror, we need phase in- 
formation about each of the n elements, which, in 
our 4-plane detector, comes from measurement of 
~ 0(4n) photons. 

It is hoped that further refinement of these al- 
gorithms together with improvements in the pre- 
processing of photon- limited defocus images will al- 
low an extension of AO capabilities down to the 
lowest light-levels. This would make Lucky Imag- 
ing a powerful technique for ground-based imaging 
of faint objects at high angular resolution. 

The authors would like to acknowledge helpful 
discussions and advice from J. R. Fienup. 
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